FW 8051: Slide Presentations – HW 03: Non-linear and GLS models

Polynomials versus splines

Both are acceptable and can capture the non-linear relationship between height and age, but a quandratic will eventually “bend” (up or down) in both directions.

Code

ggplot(ElephantsMF, aes(x=Age,y= Height,color=Sex)) + 
  geom_point() + 
  geom_smooth(method="lm", formula= y~ poly(x, 2), se=TRUE) + 
  xlab("Age (Years)") +ylab("ShoulderHeight(cm)")

Plot of height versus age with quadratic fit

Code

ggplot(ElephantsMF, aes(x=Age,y= Height,color=Sex)) + 
  geom_point() + 
  geom_smooth(method="lm", formula= y~ ns(x, 3), se=TRUE) +
  xlab("Age (Years)") +ylab("ShoulderHeight(cm)")

Plots

Mapping color to Sex results in an interactive model being plotted:

lm.ele <- lm(Height ~ Sex*poly(Age, 2), data = ElephantsMF)

That is not this model:

lm.ele <- lm(Height ~ Sex + poly(Age, 2), data = ElephantsMF)

Code

ggplot(ElephantsMF, aes(x=Age,y= Height,color=Sex)) + 
  geom_point() + 
  geom_smooth(method="lm", formula= y~ poly(x, 2), se=TRUE) + 
  xlab("Age (Years)") +ylab("ShoulderHeight(cm)")

Code

newdata <- data.frame(expand.grid(Sex = c("M", "F"), 
                                  Age = seq(0, 33, by = 1)))
newdata$phat <- predict(lm.ele, newdata =newdata)
ggplot(ElephantsMF, aes(x=Age,y= Height,color=Sex)) + 
  geom_point() + 
  geom_line(data = newdata, aes(Age, phat, col = Sex), lty =2, lwd =2)+
  xlab("Age (Years)") +ylab("ShoulderHeight(cm)") +
  theme_bw()

Plot of height versus age with quadratic fits

Number of knots (or df): Cubic regression splines

Could in principle compare models (e.g., using AIC) that have varying numbers of knots, or different knot locations

Danger of overfitting, and difficult to account for model-selection uncertainty

Choose a small number of knots (df), based on how much data you have and how complex you expect the relationship to be a priori

2 or 3 internal knots are usually sufficient for small data sets
Keele (2008), cited in Zuur et al, recommend 3 knots if \(n < 30\) and 5 knots if \(n > 100\)

Linear regression versus GLS models

Linear regression:

\[TF_i \sim N(\mu_i, \sigma^2)\] \[\mu_i = \beta_0 + \beta_1DBH_i\]

Minimizes \(\sum_i (TF_i - \beta_0 + \beta_1DBH_i)^2\)

GLS varPower model:

\[TF_i \sim N(\mu_i, \sigma_i^2)\] \[\mu_i = \beta_0 + \beta_1DBH_i\] \[\sigma_i = \sigma^2|DBH_i|^{2\delta}\]

Minimizes: \(\sum_i \frac{(Y - \beta_0 + \beta_1DBH_i)^2}{\sigma^2|DBH_i|^{2\delta}}\)

Estimates (SE) from the two models.
	linear model	varPower
(Intercept)	0.196 (0.280)	0.028 (0.113)
DBH	0.384 (0.013)	0.393 (0.010)

GLS models

Plot of TF versus DBH with the two model fits overlaid